智能论文笔记

Coupling Vision and Proprioception for Navigation of Legged Robots

Zipeng Fu , Ashish Kumar , Ananye Agarwal , Haozhi Qi , Jitendra Malik , Deepak Pathak

分类：机器人 | 人工智能 | 计算机视觉 | 机器学习

2021-12-03

我们利用了肢体机器人互动和预言的互补优势，实现了点球导航。腿系统能够穿过比轮式机器人更复杂的地形，而是为了充分利用这种能力，我们需要导航系统中的高级路径规划仪，了解在不同地形上的低级运动策略的步行能力。我们通过使用壁虎搜寻反馈来实现这一目标来估计行走政策的安全操作限制，并感知意外障碍和地形性质，如可能被视力错过的地面的平滑度或柔软度。导航系统使用车载相机来生成占用映射和相应的成本图以实现目标。然后，FMM（快速行进方法）规划器然后生成目标路径。速度命令生成器将此作为输入，以从安全顾问，意外障碍和地形速度限制生成作为输入附加约束的机车策略的所需速度。与轮式机器人（Logobot）基线（Logobot）基线和其他具有不相交的基调规划和低级控制的基线显示出卓越的性能。我们还在具有板载传感器和计算的Quadruped Robot上显示了我们系统的真实部署。 https://navigation-locomotion.github.io/camera-ready的视频

translated by 谷歌翻译

ReduNet: A White-box Deep Network from the Principle of Maximizing Rate Reduction

Kwan Ho Ryan Chan , Yaodong Yu , Chong You , Haozhi Qi , John Wright , Yi Ma

分类：机器学习 | 计算机视觉 | (统计)机器学习

2021-05-21

这项工作试图提供一种合理的理论框架，旨在从数据压缩和歧视性代表的原则解释现代深度（卷积）网络。我们认为，对于高维多类数据，最佳线性判别表示最大化整个数据集之间的编码率差和所有子集的平均值。我们表明，用于优化速率降低目标的基本迭代梯度上升方案自然地导致了一个名为Redunet的多层深网络，其共享现代深度网络的共同特征。深度分层架构，线性和非线性操作员，甚至网络的甚至参数都通过正向传播明确地构造了逐层构造，尽管它们通过背部传播可用于微调。所获得的“白盒”网络的所有组件都具有精确的优化，统计和几何解释。此外，当我们强制执行分类时，所以，所以网络的所有线性运算符自然地变为多通道卷曲。不变设置中的推导表明稀疏性和不变性之间的折衷，并且还表明这种深度卷积网络在光谱域中构建和学习的显着更有效。我们的初步模拟和实验清楚地验证了速率降低目标和相关的Redunet的有效性。所有代码和数据都可用于\ url {https://github.com/ma-lab-berkeley}。

translated by 谷歌翻译

Deformable Convolutional Networks

Jifeng Dai , Haozhi Qi , Yuwen Xiong , Yi Li , Guodong Zhang , Han Hu , Yichen Wei

分类：

2017-03-17

Convolutional neural networks (CNNs) are inherently limited to model geometric transformations due to the fixed geometric structures in their building modules. In this work, we introduce two new modules to enhance the transformation modeling capability of CNNs, namely, deformable convolution and deformable RoI pooling. Both are based on the idea of augmenting the spatial sampling locations in the modules with additional offsets and learning the offsets from the target tasks, without additional supervision. The new modules can readily replace their plain counterparts in existing CNNs and can be easily trained end-to-end by standard back-propagation, giving rise to deformable convolutional networks. Extensive experiments validate the performance of our approach. For the first time, we show that learning dense spatial transformation in deep CNNs is effective for sophisticated vision tasks such as object detection and semantic segmentation. The code is released at https://github.com/ msracver/Deformable-ConvNets.

translated by 谷歌翻译

On the Convergence Theory of Meta Reinforcement Learning with Personalized Policies

Haozhi Wang , Qing Wang , Yunfeng Shao , Dong Li , Jianye Hao , Yinchuan Li

分类：人工智能 | 机器学习

2022-09-21

现代的元强化学习（META-RL）方法主要基于模型 - 不合时宜的元学习开发，该方法在跨任务中执行策略梯度步骤以最大程度地提高策略绩效。但是，在元RL中，梯度冲突问题仍然很少了解，这可能导致遇到不同任务时的性能退化。为了应对这一挑战，本文提出了一种新颖的个性化元素RL（PMETA-RL）算法，该算法汇总了特定任务的个性化政策，以更新用于所有任务的元政策，同时保持个性化的政策，以最大程度地提高每个任务的平均回报在元政策的约束下任务。我们还提供了表格设置下的理论分析，该分析证明了我们的PMETA-RL算法的收敛性。此外，我们将所提出的PMETA-RL算法扩展到基于软参与者批评的深网络版本，使其适合连续控制任务。实验结果表明，所提出的算法在健身房和Mujoco套件上的其他以前的元rl算法都优于其他以前的元素算法。

translated by 谷歌翻译

Learning Temporal Consistency for Source-Free Video Domain Adaptation

Yuecong Xu , Jianfei Yang , Haozhi Cao , Keyu Wu , Wu Min , Zhenghua Chen

分类：计算机视觉

2022-03-09

基于视频的无监督域适应性（VUDA）方法改善了视频模型的鲁棒性，从而使它们能够应用于不同环境的动作识别任务。但是，这些方法需要在适应过程中不断访问源数据。然而，在许多现实世界中，源视频域中的主题和场景应该与目标视频域中的主题和场景无关。随着对数据隐私的越来越重视，需要源数据访问的方法会引起严重的隐私问题。因此，为应对这种关注，更实用的域适应情景被提出为基于无源的视频域的适应性（SFVDA）。尽管图像数据上有一些无源域适应性（SFDA）的方法，但由于视频的多模式性质，这些方法在SFVDA中产生了退化性能，并且存在其他时间特征。在本文中，我们提出了一个新颖的专注时间一致网络（ATCON）来通过学习时间一致性来解决SFVDA，并由两个新颖的一致性目标保证，即具有跨局部时间特征执行的特征一致性和源预测一致性。 ATCON通过基于预测置信度参与本地时间特征，进一步构建有效的总体特征。经验结果表明，ATCON在各种跨域动作识别基准中的最先进表现。

translated by 谷歌翻译

Aligning Correlation Information for Domain Adaptation in Action Recognition

Yuecong Xu , Jianfei Yang , Haozhi Cao , Kezhi Mao , Jianxiong Yin , Simon See

分类：计算机视觉 | 人工智能

2021-07-11

Domain adaptation (DA) approaches address domain shift and enable networks to be applied to different scenarios. Although various image DA approaches have been proposed in recent years, there is limited research towards video DA. This is partly due to the complexity in adapting the different modalities of features in videos, which includes the correlation features extracted as long-term dependencies of pixels across spatiotemporal dimensions. The correlation features are highly associated with action classes and proven their effectiveness in accurate video feature extraction through the supervised action recognition task. Yet correlation features of the same action would differ across domains due to domain shift. Therefore we propose a novel Adversarial Correlation Adaptation Network (ACAN) to align action videos by aligning pixel correlations. ACAN aims to minimize the distribution of correlation information, termed as Pixel Correlation Discrepancy (PCD). Additionally, video DA research is also limited by the lack of cross-domain video datasets with larger domain shifts. We, therefore, introduce a novel HMDB-ARID dataset with a larger domain shift caused by a larger statistical difference between domains. This dataset is built in an effort to leverage current datasets for dark video classification. Empirical results demonstrate the state-of-the-art performance of our proposed ACAN for both existing and the new video DA datasets.

translated by 谷歌翻译

ARID: A New Dataset for Recognizing Action in the Dark

Yuecong Xu , Jianfei Yang , Haozhi Cao , Kezhi Mao , Jianxiong Yin , Simon See

分类：计算机视觉

2020-06-06

黑暗视频中的动作识别任务在各种情况下很有用，例如夜间夜间监视和自动驾驶。尽管在正常照明的视频的动作识别任务中取得了进展，但在黑暗中很少有人研究动作识别。这部分是由于缺乏足够的数据集来完成此类任务。在本文中，我们探讨了黑暗视频中动作识别的任务。我们通过收集一个新数据集：黑暗（ARID）数据集中的动作识别来弥合此任务缺乏数据的差距。它由3,780多个具有11个动作类别的视频剪辑组成。据我们所知，这是第一个针对黑暗视频中人类行为的数据集。为了进一步了解我们的干旱数据集，我们详细分析了干旱数据集，并在合成黑暗视频中表现出了必要性。此外，我们在数据集上基准了几种当前动作识别模型的性能，并探索了提高其性能的潜在方法。我们的结果表明，当前的动作识别模型和框架增强方法可能不是黑暗视频中动作识别任务的有效解决方案。

translated by 谷歌翻译

RELIANT: Fair Knowledge Distillation for Graph Neural Networks

Yushun Dong , Binchi Zhang , Yiling Yuan , Na Zou , Qi Wang , Jundong Li

分类：机器学习

2023-01-03

Graph Neural Networks (GNNs) have shown satisfying performance on various graph learning tasks. To achieve better fitting capability, most GNNs are with a large number of parameters, which makes these GNNs computationally expensive. Therefore, it is difficult to deploy them onto edge devices with scarce computational resources, e.g., mobile phones and wearable smart devices. Knowledge Distillation (KD) is a common solution to compress GNNs, where a light-weighted model (i.e., the student model) is encouraged to mimic the behavior of a computationally expensive GNN (i.e., the teacher GNN model). Nevertheless, most existing GNN-based KD methods lack fairness consideration. As a consequence, the student model usually inherits and even exaggerates the bias from the teacher GNN. To handle such a problem, we take initial steps towards fair knowledge distillation for GNNs. Specifically, we first formulate a novel problem of fair knowledge distillation for GNN-based teacher-student frameworks. Then we propose a principled framework named RELIANT to mitigate the bias exhibited by the student model. Notably, the design of RELIANT is decoupled from any specific teacher and student model structures, and thus can be easily adapted to various GNN-based KD frameworks. We perform extensive experiments on multiple real-world datasets, which corroborates that RELIANT achieves less biased GNN knowledge distillation while maintaining high prediction utility.

translated by 谷歌翻译

Understanding Imbalanced Semantic Segmentation Through Neural Collapse

Zhisheng Zhong , Jiequan Cui , Yibo Yang , Xiaoyang Wu , Xiaojuan Qi , Xiangyu Zhang , Jiaya Jia

分类：计算机视觉 | 机器学习

2023-01-03

A recent study has shown a phenomenon called neural collapse in that the within-class means of features and the classifier weight vectors converge to the vertices of a simplex equiangular tight frame at the terminal phase of training for classification. In this paper, we explore the corresponding structures of the last-layer feature centers and classifiers in semantic segmentation. Based on our empirical and theoretical analysis, we point out that semantic segmentation naturally brings contextual correlation and imbalanced distribution among classes, which breaks the equiangular and maximally separated structure of neural collapse for both feature centers and classifiers. However, such a symmetric structure is beneficial to discrimination for the minor classes. To preserve these advantages, we introduce a regularizer on feature centers to encourage the network to learn features closer to the appealing structure in imbalanced semantic segmentation. Experimental results show that our method can bring significant improvements on both 2D and 3D semantic segmentation benchmarks. Moreover, our method ranks 1st and sets a new record (+6.8% mIoU) on the ScanNet200 test leaderboard. Code will be available at https://github.com/dvlab-research/Imbalanced-Learning.

translated by 谷歌翻译

Argoverse 2: Next Generation Datasets for Self-Driving Perception and Forecasting

Benjamin Wilson , William Qi , Tanmay Agarwal , John Lambert , Jagjeet Singh , Siddhesh Khandelwal , Bowen Pan , Ratnesh Kumar , Andrew Hartnett , Jhony Kaesemodel Pontes

分类：计算机视觉 | 人工智能 | 机器学习 | 机器人

2023-01-02

We introduce Argoverse 2 (AV2) - a collection of three datasets for perception and forecasting research in the self-driving domain. The annotated Sensor Dataset contains 1,000 sequences of multimodal data, encompassing high-resolution imagery from seven ring cameras, and two stereo cameras in addition to lidar point clouds, and 6-DOF map-aligned pose. Sequences contain 3D cuboid annotations for 26 object categories, all of which are sufficiently-sampled to support training and evaluation of 3D perception models. The Lidar Dataset contains 20,000 sequences of unlabeled lidar point clouds and map-aligned pose. This dataset is the largest ever collection of lidar sensor data and supports self-supervised learning and the emerging task of point cloud forecasting. Finally, the Motion Forecasting Dataset contains 250,000 scenarios mined for interesting and challenging interactions between the autonomous vehicle and other actors in each local scene. Models are tasked with the prediction of future motion for "scored actors" in each scenario and are provided with track histories that capture object location, heading, velocity, and category. In all three datasets, each scenario contains its own HD Map with 3D lane and crosswalk geometry - sourced from data captured in six distinct cities. We believe these datasets will support new and existing machine learning research problems in ways that existing datasets do not. All datasets are released under the CC BY-NC-SA 4.0 license.

translated by 谷歌翻译